One Dimensional Data Worksheet

This worksheet reviews the concepts discussed about 1 dimensional data. The goal for these exercises is getting you to think in terms of vectorized computing. This worksheet should take 20-30 minutes to complete.


In [1]:
import pandas as pd
import numpy as np

Exercise 1

Create a Series object with 100 random integers, then filter out odd integers and reindex the Series. Hint: you can use python np.random.random_integers(1, 100, 100) to create the random numbers. Print out the first 20 numbers.


In [2]:
#First initialize the series by calling the pd.Series() function
randomNumbers = pd.Series( np.random.randint(1, 100, 100) )
#Display the first 5 random numbers
print( randomNumbers.head() )

#Next filter out the odd numbers by using the mod operator and reset the index
evenRandomNumbers = randomNumbers[ randomNumbers % 2 == 0].reset_index( drop=True )

#Display the first 5

evenRandomNumbers.head()


0    61
1    35
2    20
3    65
4    28
dtype: int64
Out[2]:
0    20
1    28
2    76
3    32
4    80
dtype: int64

Exercise 2

You will be given a list containing 10 strings. Create a new Series called validPhoneNumbers that only contains data in the format (XXX)XXX-XXXX. Don't forget to reindex the series after you've filtered it.


In [3]:
numbers = ['(342)123-2345', '410-342-3421', '(234 434-2121', '(301)822-3423', '123-234-3423', '(410)555-4443', 'AAAAHHH', '(XXX)XXX-XXXX', '(602)123-4535', '(234)127-4534']

In [4]:
#Predefined list of numbers
numbers = ['(342)123-2345', '410-342-3421', '(234 434-2121', '(301)822-3423', '123-234-3423', '(410)555-4443', 'AAAAHHH', '(XXX)XXX-XXXX', '(602)123-4535', '(234)127-4534']

#Create the phone numbers series
phoneNumbers = pd.Series( numbers )

#Next filter the phone numbers by using the str.match function
validPhoneNumbers = phoneNumbers[ phoneNumbers.str.match( r'\(\d{3}\)\d{3}-\d{4}') ].reset_index( drop=True )

Exercise 3

The code below contains a lambda function which converts a temperature from Farenheit to Celsius. You are given a Series called temperatures in Farhenheit. Using the .apply() function, convert the data into degrees Celsius.


In [5]:
#This function converts a number from Farenheit to Celsius
toCelsius = lambda x: (float(5)/9)*(x-32)

#Creates a series with numbers that represent temperatures in Farenheit
tempsInFarenheit = pd.Series( [92,33,-5,17,122,87 ])

In [6]:
#Your code here...
tempsinCelsius = tempsInFarenheit.apply( toCelsius )
print( tempsinCelsius)


0    33.333333
1     0.555556
2   -20.555556
3    -8.333333
4    50.000000
5    30.555556
dtype: float64

Exercise 4

You are given a list of numbers called numList. Without using a loop, write a script to count occurances of each value in the list.


In [7]:
numList = [1,1,1,1,1,2,4,5,7,5,4,5,6,4,3,5,5,5,6,9,0,7,6,7,5,4,4,7]

In [8]:
#Your code here...
numSeries = pd.Series( numList)
numSeries.value_counts()


Out[8]:
5    7
4    5
1    5
7    4
6    3
9    1
3    1
2    1
0    1
dtype: int64

Exercise 5

You are given a Series of IP Addresses and the goal is to limit this data to private IP addresses. Python has an ipaddress module which provides the capability to create, manipulate and operate on IPv4 and IPv6 addresses and networks. Complete documentation is available here: https://docs.python.org/3/library/ipaddress.html.

Here are some examples of how you might use this module:

import ipaddress
myIP = ipaddress.ip_address( '192.168.0.1' )
myNetwork = ipaddress.ip_network( '192.168.0.0/28' )

#Check membership in network
if myIP in myNetwork:  #This works
    print "Yay!"

#Loop through CIDR blocks
for ip in myNetwork:
    print( ip )

192.168.0.0
192.168.0.1


192.168.0.13
192.168.0.14
192.168.0.15

#Testing to see if an IP is private
if myIP.is_private:
    print( "This IP is private" )
else:
    print( "Routable IP" )
  1. First, write a function which takes an IP address and returns true if the IP is private, false if it is public. HINT: use the ipaddress module.
  2. Next, use this to create a Series of true/false values in the same sequence as your original Series.
  3. Finally, use this to filter out the original Series so that it contains only private IP addresses.

In [9]:
import ipaddress
hosts = [ '192.168.1.2', '10.10.10.2', '172.143.23.34', '34.34.35.34', '172.15.0.1', '172.17.0.1']

In [10]:
from ipaddress import ip_address 
IPData = pd.Series( hosts )
privateIPs = IPData[IPData.apply( lambda x : ip_address(x).is_private ) ]
print( privateIPs )


0    192.168.1.2
1     10.10.10.2
5     172.17.0.1
dtype: object